Goto

Collaborating Authors

 video prediction


Unsupervised Learning for Physical Interaction through Video Prediction

Neural Information Processing Systems

A core challenge for an agent learning to interact with the world is to predict how its actions affect objects in its environment. Many existing methods for learning the dynamics of physical interactions require labeled object information. However, to scale real-world interaction learning to a variety of scenes and objects, acquiring labeled data becomes increasingly impractical. To learn about physical object motion without labels, we develop an action-conditioned video prediction model that explicitly models pixel motion, by predicting a distribution over pixel motion from previous frames. Because our model explicitly predicts motion, it is partially invariant to object appearance, enabling it to generalize to previously unseen objects. To explore video prediction for real-world interactive agents, we also introduce a dataset of 59,000 robot interactions involving pushing motions, including a test set with novel objects. In this dataset, accurate prediction of videos conditioned on the robot's future actions amounts to learning a visual imagination of different futures based on different courses of action. Our experiments show that our proposed method produces more accurate video predictions both quantitatively and qualitatively, when compared to prior methods.



MCVD: MaskedConditionalVideoDiffusionfor Prediction,Generation,and Interpolation

Neural Information Processing Systems

Wecanseethatthisisenough time fortwodifferent painted arrows to pass under the car. If one zooms in, one can inspect the relative positions of the arrow and the Mercedes hood ornament in the real versus predicted frames.




OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive Learning

Neural Information Processing Systems

OpenSTL provides a modular and extensible framework implementing various state-of-the-art methods. We conduct standard evaluations on datasets across various domains, including synthetic moving object trajectory, human motion, driving scenes, traffic flow, and weather forecasting.





Video Prediction via Selective Sampling

Jingwei Xu, Bingbing Ni, Xiaokang Yang

Neural Information Processing Systems

This module is trained in an adversarial learning manner [5]. The Selectionmodule selects high possibility candidates from proposals and combines to produce the final prediction, according to the criteria of better position matching.